智能论文笔记

Split Moves for Monte-Carlo Tree Search

Jakub Kowalski , Maksymilian Mika , Wojciech Pawlik , Jakub Sutowicz , Marek Szykuła , Mark H. M. Winands

分类：人工智能

2021-12-14

在许多游戏中，动作包括玩家制作的若干决定。这些决定可以被视为单独的动作，这在效率原因的多动作游戏中已经是一个常见的做法。播放器的这种划分进入一系列更简单/较低级别的移动，称为\ emph {拆分}。到目前为止，分裂移动已仅在顾问的直接案件中应用，此外，几乎没有研究揭示其对代理商的影响力量的影响。采取知识的视角，我们的目标是回答如何在Monte-Carlo树搜索（MCT）中有效地使用分裂移动，以及分裂设计对代理的实际影响是什么。本文提出了与任意分裂的动作有用的MCT的概括。我们设计了算法的几种变体，并尝试分别测量分离移动的影响，以分别对效率，MCT，模拟和基于动作的启发式的效率。测试是在一组棋盘游戏上进行，并使用常规的主台综合游戏进行播放形式主义进行，其中可以基于游戏的抽象描述自动派生不同粒度的分裂策略。结果以不同方式使用分流设计的代理行为概述。我们得出结论，拆分设计可能对单一以及多动作游戏有很大的利益。

translated by 谷歌翻译

Modern French Poetry Generation with RoBERTa and GPT-2

Mika Hämäläinen , Khalid Alnajjar , Thierry Poibeau

分类：自然语言处理

2022-12-06

We present a novel neural model for modern poetry generation in French. The model consists of two pretrained neural models that are fine-tuned for the poem generation task. The encoder of the model is a RoBERTa based one while the decoder is based on GPT-2. This way the model can benefit from the superior natural language understanding performance of RoBERTa and the good natural language generation performance of GPT-2. Our evaluation shows that the model can create French poetry successfully. On a 5 point scale, the lowest score of 3.57 was given by human judges to typicality and emotionality of the output poetry while the best score of 3.79 was given to understandability.

translated by 谷歌翻译

Emotion Conditioned Creative Dialog Generation

Khalid Alnajjar , Mika Hämäläinen

分类：自然语言处理

2022-12-06

We present a DialGPT based model for generating creative dialog responses that are conditioned based on one of the following emotions: anger, disgust, fear, happiness, pain, sadness and surprise. Our model is capable of producing a contextually apt response given an input sentence and a desired emotion label. Our model is capable of expressing the desired emotion with an accuracy of 0.6. The best performing emotions are neutral, fear and disgust. When measuring the strength of the expressed emotion, we find that anger, fear and disgust are expressed in the most strong fashion by the model.

translated by 谷歌翻译

Automatic Generation of Factual News Headlines in Finnish

Maximilian Koppatz , Khalid Alnajjar , Mika Hämäläinen , Thierry Poibeau

分类：自然语言处理

2022-12-05

We present a novel approach to generating news headlines in Finnish for a given news story. We model this as a summarization task where a model is given a news article, and its task is to produce a concise headline describing the main topic of the article. Because there are no openly available GPT-2 models for Finnish, we will first build such a model using several corpora. The model is then fine-tuned for the headline generation task using a massive news corpus. The system is evaluated by 3 expert journalists working in a Finnish media house. The results showcase the usability of the presented approach as a headline suggestion tool to facilitate the news production process.

translated by 谷歌翻译

Video Games as a Corpus: Sentiment Analysis using Fallout New Vegas Dialog

Mika Hämäläinen , Khalid Alnajjar , Thierry Poibeau

分类：自然语言处理

2022-12-05

We present a method for extracting a multilingual sentiment annotated dialog data set from Fallout New Vegas. The game developers have preannotated every line of dialog in the game in one of the 8 different sentiments: \textit{anger, disgust, fear, happy, neutral, pained, sad } and \textit{surprised}. The game has been translated into English, Spanish, German, French and Italian. We conduct experiments on multilingual, multilabel sentiment analysis on the extracted data set using multilingual BERT, XLMRoBERTa and language specific BERT models. In our experiments, multilingual BERT outperformed XLMRoBERTa for most of the languages, also language specific models were slightly better than multilingual BERT for most of the languages. The best overall accuracy was 54\% and it was achieved by using multilingual BERT on Spanish data. The extracted data set presents a challenging task for sentiment analysis. We have released the data, including the testing and training splits, openly on Zenodo. The data set has been shuffled for copyright reasons.

translated by 谷歌翻译

Silo NLP's Participation at WAT2022

Shantipriya Parida , Subhadarshi Panda , Stig-Arne Grönroos , Mark Granroth-Wilding , Mika Koistinen

分类：自然语言处理

2022-08-02

本文提供了对亚洲翻译研讨会（WAT2022）的“ Silo NLP”提交的系统描述。我们参加了指示多模式任务（英语 - >印地语，英语 - > Malayalam和英语 - >孟加拉语多模式翻译）。对于仅文本翻译，我们从刮擦和微调的MBART-50型号训练了变压器。对于多模式翻译，我们使用了相同的MBART架构和从图像提取的对象标签来用作与文本序列连接的视觉特征。我们的提交提交的许多任务包括英语 - >印地语多模式翻译（评估测试），英语 - > Malayalam纯文本和多模式翻译（评估测试），英语 - > Bengali - > Bengali多模式翻译（挑战测试）和英语 - > Bengali-> Bengali-> bengali->仅翻译（评估测试）。

translated by 谷歌翻译

Multilingual Persuasion Detection: Video Games as an Invaluable Data Source for NLP

Teemu Pöyhönen , Mika Hämäläinen , Khalid Alnajjar

分类：自然语言处理

2022-07-10

角色扮演游戏（RPG）在视频游戏对话中具有相当多的文本。游戏开发人员经常将此文本半通知。在本文中，我们从几个RPG中提取了有说服力对话的多语言数据集。我们使用称为BERT的自然语言处理（NLP）模型来显示该数据在构建说服检测系统中的生存能力。我们认为，作为各种NLP任务的数据源，视频游戏具有许多未使用的潜力。本文中描述的代码和数据可在Zenodo上找到。

translated by 谷歌翻译

Processing M.A. Castrén's Materials: Multilingual Typed and Handwritten Manuscripts

Niko Partanen , Jack Rueter , Mika Hämäläinen , Khalid Alnajjar

分类：自然语言处理

2021-12-28

该研究形成了由芬兰民族学家和语言学家，Matthias Alexander Castr \'en（1813-1852）收集和出版的材料进行的各种任务的技术报告。 Finno-Ugrian社会正在将Castr \'en的稿件作为新的关键和数字版本出版，同时不同的研究团体也关注这些材料。我们讨论了所用的工作流程和技术基础设施，并考虑如何创建有利于不同计算任务的数据集以进一步提高这些材料的可用性，并帮助进一步处理类似的归档集合。我们专注于以一种方式处理的集合的部分，这些集合可以在更提高其在更多技术应用中的可用性，补充较早的这些材料的文化和语言方面的工作。大多数这些数据集在Zenodo公开使用。该研究指出需要进一步研究的特定区域，并为文本识别任务提供基准。

translated by 谷歌翻译

TFW2V: An Enhanced Document Similarity Method for the Morphologically Rich Finnish Language

Quan Duong , Mika Hämäläinen , Khalid Alnajjar

分类：自然语言处理

2021-12-23

测量不同文本的语义相似性在数字人文研究中具有许多重要应用，例如信息检索，文档聚类和文本摘要。不同方法的性能取决于文本，域和语言的长度。本研究侧重于试验一些目前的芬兰方法，这是一种形态学丰富的语言。与此同时，我们提出了一种简单的方法TFW2V，它在处理长文本文档和有限的数据时显示出高效率。此外，我们设计了一种客观评估方法，可以用作基准标记文本相似性方法的框架。

translated by 谷歌翻译

Automated Speech Scoring System Under The Lens: Evaluating and interpreting the linguistic cues for language proficiency

Pakhi Bamdev , Manraj Singh Grover , Yaman Kumar Singla , Payman Vafaee , Mika Hama , Rajiv Ratn Shah

分类：自然语言处理

2021-11-30

英语水平评估已成为过滤和选择学术界和工业的预期候选人的必要度量。随着这种评估需求的增加，越来越必要拥有自动化的人类可意识的结果，以防止不一致并确保对第二语言学习者有意义的反馈。基于特征的经典方法在理解得分模型学习的内容方面更具可解释。因此，在这项工作中，我们利用古典机器学习模型作为分类和回归问题的语音评分任务，其次是彻底的研究来解释和研究语言线索与扬声器的英语水平之间的关系。首先，我们提取五个类别（流利，发音，内容，语法和词汇和声学）的语言学家特征，并列车模型到级响应。相比之下，我们发现基于回归的模型相当于或更好地比分类方法更好。其次，我们进行消融研究以了解每个特征和特征类别对熟练分级性能的影响。此外，要了解个别特征贡献，我们展示了顶部特征对分级任务的最佳执行算法的重要性。第三，我们利用部分依赖性地块和福芙值来探索特征重要性，并得出结论，最好的培训模式了解用于分级本研究中使用的数据集的底层尺寸。

translated by 谷歌翻译